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Abstract 

The usual x 2 method of fit quality assessment is a special case of the more general method of Bayesian model 
comparison which involves integrals of the likelihood and prior over all possible values of all parameters. We 
introduce new parametrisations based on systematic expansions around the stretched exponential or Fourier- 
transformed Levy source distribution, and utilise the increased discriminating power of the Bayesian approach to 
evaluate the relative probability of these models to be true representations of a recently measured Bose-Einstein 
correlation data in e + e~ annihilations at LEP. 



1 Bayes factors 



The Bayesian definition of probability differs radically from the conventional "frequentist" one, necessitating the 
overhaul of many concepts and techniques used in statistics and its applications. Since its introduction in 1900 
PQ, the x 2 statistic has become the standard criterion for goodness of fit in physics and many other disciplines, 
while Laplace's Bayesian approach .2] remained largely forgotten until revived by Jeffreys [3J. Later refinements 
such as the Maximum Likelihood occupy a middle ground between the two approaches. 

In this contribution, we demonstrate the use of one Bayesian technique in the simple context of fitting or, 
more generally, the quantitative assessment of evidence in favour of a hypothesis H\ as a description of given 
data, compared to a rival hypothesis H2. We do so by analysing the concrete example of binned data for the 
correlation function C2(Q) in the four- momentum difference Q = y— (pi — P2) 2 as published recently by the L3 
Collaboration [3]. 

Suppose we have data D = {Qi, . . . , Q n } consisting of n measurements of particle four-momentum differences, 
assumed to be mutually independent as is customary in femtoscopy. Typically, the experimentalist will want to test 
how well various parametrisations fit the data. For the purposes of Bayesian analysis, a given parametrisation 
y(Q I m ) with N m free parameters 8 m — {9 m i, 9 m 2, ■ ■ ■ , 9 m N m } is considered a "model" or "hypothesis" H m . 
The starting point is the odds in favour of model H m compared to a different model He ", defined as the ratio 
p(H m I D)/p(He I D), while the evidence for H m versus He is the logarithrrQ of the odds. Use of Bayes' Theorem 
for both hypotheses yields 
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The evidence of H m versus He is therefore the same as the Bayes factor B m e = lg[p(D | H m )/p(D | He)] if there 
is no a priori reason to prefer H m above He and therefore p(H m ) = p(He) = 1/2. A large Bayes factor says that 
the evidence for H m is stronger than the evidence for He and vice versa. It can be written as a ratio of integrals 
over the respective parameter spaces of 6 m and Oe , 
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• p(D I He 



lg 
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Solving the high-dimensional integrals will often be an arduous task. Fortunately, the independence of the 
measurements implies that the likelihood p(D \0 m , H m ) factorises into the product of likelihoods for individual 
data points, which by assumption have the same form, 



p(D I 9 m ,H m ) = Y[p(Qi I 8 m ,H m ) » [p(Q\6 m ,H m )] n . 



(3) 



Due to the large exponent, even the slightest nonuniformity in p(Q\8 m ,H m ) will lead to the development of 
a strong peak in parameter space for the overall likelihood, situated at the maximum likelihood point 6 m . An 
asymmetric prior p(0 m | H m ) will shift the peak to a value 0* m , but it will not materially affect the width of the 
peak or its differentiability. Unless the shifted peak falls on a boundary of the parameter space or happens to be 
nondifferentiable, it can therefore be expanded around 0"^ [5]: 



p(D I m , H m )p{O m I H m ) ~ p(D ! 0\H m )p{0* I H m ) exp 



--(#m — 6* m )A 1 (8 m — 6 m ) 



(4) 
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2 We use lg = log 2 ; other base units can be substituted as preferred. 
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where A 1 is the Hessian of the expansion 



.-i _ d 2 ln[p{-D\6 m ,H m )p{0 m \H m )} 



(5) 



and A is the parameter covariance matrix. As more data is accumulated, the peak narrows so that we can neglect 
the fact that parameters may have finite ranges. Integrating the above as if it were a Gaussian, one obtains 
Laplace's result [2] 



/ + oo 
dO p(D | m , H m )p{O m I H m ) oi p(D | m , H m ) p(C I H m ) ^(2n) N ™ det A m , 
- CO 
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which under the stated assumptions is a good approximation of the full-blown integral appearing in Eq. ([2]) if 
n > 20iV m . The Bayes factor becomes simply the difference 

B m i — hi — h m (7) 

h k = - lg [p(D | 01, H k )p(0* k | ffJv/W 5 * det A fc ] . (8) 

Evidence /i^ can be determined for any single model ff^, but has no meaning on its own; only differences hi — h m 
are meaningful in quantifying the probability for H m to be true compared to He, 



p{H m | D) 
p(fl-£|D) " 



(9) 



2 Relationship to x 2 an d the Maximum Likelihood 

The Bayesian results obtained above differ from the traditional Maximum Likelihood Estimate (MLE), which 
ignores the priors p(0 m | H m ) and approximates the integral © to the maxima of the likelihoods, 



BmP = 1; 
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p(D\O e ,H e ) 

The traditional \ 2 goodness-of-fit is related to the above as follows. The measurements {Qi} are binned into bins 
b = 1, . . . , B with bin midpoints Qb, yielding the histogram version of the data, D = {ni,}i^ 1 with ~}2 b nt — 1. 
The most general "parametrisation" of the histogram contents is then the multinomial with a = {at}f =1 the set 
of Bernoulli probabilities with B — 1 degrees of freedom, 



p(n | at, n) 



B n b 

- LX 7lb\ 
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which on use of the Stirling approximation becomes, up to a normalisation constant, 



p(n a,n) = c • exp 



n b In 



Expanding the free parameters a around the measured data n and truncating 



p(n | a, n) = c ■ exp 
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(12) 



(13) 



we can identify the multinomial quantities with the measured correlation functions at mid-bin points Q b by 
setting n b IC2(Q b ), C = ^2 b C2(Q b ), and n IC. The n b in the denominator is almost equal to the 
measured bin variances n b ~ a 2 (n b ) — I 2 a 2 (C2{Q b )) so that the quadratic term is 



(na b -n b ) 2 [C 2 (Qb) - v(Qb\0„ 



2n b 



2a(C 2 (Q b )) 2 



(14) 



where na b /I — > y(Q b | 9 m ), which includes all the constants, is the unnormalised parametrisation for C2(Q) in 
common use. Comparing this to the usual definition 



* 2 = E 



[C2(Q b )-y(Q b \e m )f 
o{C 2 {Q b )) 2 



(15) 



we see that the maximum likelihood is approximately equal to 



p(P\e m ,H m ) ~e~ x 12 , (16) 

so that x 2 is seen to be an approximation of the Bayes formulation, using only a single point in the parameter 
space m = 8 m and thereby effectively assuming a uniform prior. Furthermore, \ 2 truncates the expansion of 
(|13[) : this is probably the approximation most vulnerable to criticism. 



3 7 is an arbitrary large integer to ensure that IC 2 (Q b ) is an integer. As it eventually cancels out, its size is immaterial. 
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3 Parametrisations and Levy-based polynomial expansions 



We now apply the above general ideas to the specific case of the various parametrisations shown in Table 1 for the 
correlation function data for two-jet events published by the L3 Collaboration [4]. Hypotheses Hi to Hz are taken 
from the L3 paper. Realising that it is important to quantify the degree of deviation of Bose-Einstein correlation 
data from the Gaussian or the exponential shape, the L3 Collaboration also studied a "Laguerre expansion" as 
well as the symmetric Levy source distribution, characterized by the stretched-exponential correlation function 
of hypothesis Hi- In Hi and Hz, we propose a new expansion technique that measures deviations from H2 in 
terms of a series of "Levy polynomials" that are orthogonal to the characteristic function of symmetric Levy 
distributions, generalising the results presented in Ref. [6]. 

/ \ / MO,Q Ml, a M2,Q \ 

Li (x I a) = det ( °' a 1,01 J L 2 (x\a) = deti //i lQ fi 2 , a fi3, a etc. (17) 

^ X ' \ 1 x x 2 ) 

where fi r ,a = J °° dx x r f(x \ a) — ^r(^ii). These reduce, up to a normalisation constant, to the Laguerre 
polynomials for a = 1. Figure 1 displays two examples for various values of a. Polynomials cannot be both or- 
thogonal and derivatives for transcendental weight functions [9], and therefore in Hq and H7 we also investigated 
nonorthogonal derivative functions of the stretched exponentiajj. 





Hypothesis 


Functional form 




N m 


Hj 


Gauss 


j[l+eQ] [l + Ae- fl2Q2 ] 




4 


H 2 


Stretched Exponential 


7[l+e<3] [l + Ae- fl ° Q °] 




5 


H 3 


Simplified r-model 


7[1 + sQ] [l + \e- R2 " Q2a cos[tan(a7r/2) R 2a 


Q 2a }] 


5 


H 4 


lst-order Levy polynomial 


7 [l + Ae -ilO0O [H-ciL 1 (Q|o,fl)]] 




5 


H 5 


3rd-order Levy polynomial 


7 [l + \e- RaQ " [1 + ciLi(Q\a, R) + c 3 L 3 (Q| 


a,R)]] 


6 


H 6 


lst-order derivative 






5 


H 7 


3rd-order derivative 
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Table 1: Summary of parametrisations tested 




Figure 1: Levy polynomials of first and third order times the weight function e x for a = 0.8, 1.0, 1.2, 1.4. 



4 Application to L3 binned data 

In Table 2, we show the results of applying the Laplace approximation (JS| to the L3 two-jet data, which is provided 
in terms of 100 binned values for the correlation function C(Qb) together with standard errors a(C(Qb)) in the 
range < Q < 4 GeV. Throughout, we used a Gaussian prior p(6^ n | H m ) with a width which was determined 
by numerical integration over one of the L3 data points. To illustrate the contributions of the likelihood, prior 
and determinant factors entering h m in |@J , we have listed their logarithmic contributions separately in the three 
columns headed L, P and F. These quantities are therefore the building blocks for calculating the odds between 
any two competing hypotheses. Thus one can, for example, deduce that the odds for Hi compared to Hq are 

4 Note the absence of the [1+eQ] long-range correction term. L3 demonstrated that this term vanishes if the dip, the non-positive 
definiteness of C2(Q) — 1, is taken into account by the parametrisation elsewhere, e.g. by the cosine in H3 and by the first-order 
polynomials in H4 and H5, resulting in e values consistent with zero. 
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2ioo.6 97.o ^ ^2 : 1_ Also included in Table 2 are the traditional \ 2 measure (C) and its associated confidence level 
(CL). 



Hypothesis 


N m 


L 


P 


F 




C 


CL 


Hi Gauss 


4 


177.8 


-3.6 


32.2 


206.5 


2.57 


3.4xl0~ ia % 


H2 Stretched Exponential 


5 


138.5 


-0.5 


34.0 


172.0 


2.02 


1.5xl0" b % 


H3 Simplified r-model 


5 


68.2 


-3.4 


37.0 


101.8 


1.00 


49.1% 


H4, lst-order Levy polynomial 


5 


66.2 


2.2 


30.3 


98.8 


0.97 


57.3% 


H$ 3rd-order Levy polynomial 


6 


65.9 


3.8 


41.6 


111.3 


0.97 


55.7% 


H§ lst-order derivative 


5 


67.3 


4.2 


29.1 


100.6 


0.98 


53.0% 


H-j 3rd-order derivative 


6 


60.4 


4.9 


31.7 


97.0 


0.89 


77.0% 



Table 2: Results of fitting parametrisations listed in Table 1. 

Legend: L = - lg P(D | 0* m , H m ) = x7(2 In 2) h m = L + P + F 
V = ~\gP {0* m \H m ) C=xV(B-N m ) 
F = - lg x /(2 7 r) iv '« det A CL = confidence level 

It is inappropriate to generalise conclusions based on one specific dataset with its specific circumstances. The fact 
that in the two-jet L3 data the correlation function C2(Q) drops well below 1.0 for 0.5 < Q < 2 GeV, for example, 
is probably the dominant influence on the goodness of fit. Under this caveat, we make the following observations 
regarding the results shown in Table 2: 

1. At first sight, the Bayes factor and the \ 2 methodologies deliver judgements which are rather similar: Hj 
is consistently ranked best, while Hi and H2 are ranked worst (least likely). The two methodologies yield 
vastly different numbers when one hypothesis is bad. As shown below, there are surprising variations even 
among the better ones. 

2. The determinant plays an important role. For example, factor F= 41.6 for H5 is significantly larger than 
that of similar models H4 and H§ even though the three log likelihoods are similar. This can be traced to 
the fact that the uncertainty in the parameters for H5 is larger, as expressed in the width of its Gaussian 
@. While x 2 , based only on the likelihood, can hardly distinguish between H4 and H$, the contribution 
of the large H$ determinant ensures that the Bayesian odds for H4 versus H$ are 5800:1. In other words, 
by taking into account not only the best parameter values 0$ but also their uncertainties, the Bayes factor 
could distinguish what x 2 could not. 

3. Our Bayes factor calculation takes the experimental standard errors o(C(Qb)) into account by using (|14[) in 
the exponent of the likelihood; in other words, we assume that they are Gaussian. We can improve on this 
approximation by doing a more complete Bayesian analysis using not the binned data but the pair momenta 
{Qi} themselves. 

4. As Fig. 1 shows, the Levy polynomials introduced here are well suited to describe one-sided strongly-peaked 
data. It may be helpful to use them, as we have done here, merely as part of parametrisations of data to 
which they show some resemblance. More systematic use in Gram-Charlier or other expansions will be faced 
with issues inherent in all asymptotic series [7] [8] . 

5 Conclusions 

1. In hypotheses H4 to H7, we have presented new techniques to study deviations from a stretched exponential 
or Fourier-transformed Levy shape. Details will be published elsewhere. 

2. The standard measures of fit quality like \ 2 or CL are useful in rejecting models which are inconsistent with 
a given dataset. Where two or more models are consistent with the data, however, they are unable to select 
the more probable. The Bayes factor (J9]l permits quantification of the evidence (relative probability) for the 
validity of models. 

3. Besides the likelihood, the prior and determinant also play a role, sometimes decisively so. 

4. The Laplace approximation © is usually fairly accurate, but the assumption of Gaussian errors for count 
data (|13|) . which is made by truncation of the Taylor expansion in the data, is of dubious quality. 

5. By integrating over parameter space, Bayesian evidence takes into account all possible values of the param- 
eters, while x 2 an d Maximum Likelihood do not. 

6. Bayes factors depend linearly on the two priors. This is good in that they are made explicit, but bad in the 
sense that results can and do change depending on the choice of priors. 

7. The omission of priors in x 2 is to its disadvantage as it discards important information. 

8. It may appear that x 2 does not need any alternative hypothesis to be of use. This is not so, however: the 
alternative implicit in x 2 1S the "Bernoulli class" of multinomials [lOj . 
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